-
Notifications
You must be signed in to change notification settings - Fork 204
implement server agent draining #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
implement server agent draining #795
Conversation
Signed-off-by: Imran Pochi <[email protected]>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ipochi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
I'm happy to see interest in completing this feature! When I had been looking into this, here's the questions I ran into:
That potential agent change is probably an improvement but needs careful risk analysis.
|
Thank you for the feedback @jkh52 Your point on 1, hadn't crossed my mind. It seems logical thing to do to stop the agent sync loop if the agent is draining. I'll give more thought on this whether the server should do more so that it doesn't get into a conflicted state or the agent should stop teh sync loops. On Point 2, I think its valid point that continue to use the draining agent if there are no non-draining agents. Its akin falling back on current behaviour. |
This commit adds a fallback in the case where all the agents in system are draining. Rather than drop the request with error, we fallback to the existing behavior i.e continue to the send the request to the agent even if its draining. As for the agent side issue, if the agent has sent the DRAIN signal to the server, ideally it should stop doing the syncOnce with the server. This mistakes the server the agent is back ready. Signed-off-by: Imran Pochi <[email protected]>
|
Current PR limitations:
Expanding on 1 (agent side implementation) simplest option is to stop the agent from doing connectOnce if the agent is draining. I don't want to complicate the logic by making the agents do anything fancy or complex like sending DIAL_CLS or something to stop accepting new connections coz the agent doesn't have a whole view of the system (may its a one agent setup or all agents are draining) in which case server retry and agent doing something fancy could lead to deadlock (back and forth) Expanding on the fall back (when all or the only agent is in draining state), I think the simplest option in this case would again be pick the first draining agent you found and just send the request to that agent, even it it means the agent is draining. These limitations are now addressed in the second commit of the PR. Please review and let me know your thoughts. |
Currently, the server only logs if it receives a drain request from the agent. Ideally, in that scenario, the server should mark it and not select it for newer requests. This PR implements that.